Designing and implementing an experimental survey on knowledge and perceptions about alcohol warning labels

Abstract Objectives This paper describes the design and implementation of an online survey experiment to investigate the effects of alcohol warning labels on alcohol‐related knowledge, risk perceptions and intentions. Method The survey collected self‐reported data from 14 European countries through two waves of data collection with different recruitment strategies: dissemination via social media and public health agencies was followed by paid‐for Facebook ads. The latter strategy was adopted to achieve broader population representation. Post‐stratification weighting was used to match the sample to population demographics. Results The survey received over 34,000 visits and resulted in a sample size of 19,601 participants with complete data on key sociodemographic characteristics. The responses in the first wave were over‐representing females and higher educated people, thus the dissemination was complemented by the paid‐for Facebook ads targeting more diverse populations but had higher attrition rate. Conclusion Experiments can be integrated into general population surveys. Pan‐European results can be achieved with limited resources and a combination of sampling methods to compensate for different biases, and statistical adjustments.


| INTRODUCTION
Alcohol is a major risk factor for the burden of disease, causing about 3 million deaths and more than 130 million DALYs lost worldwide in 2016 (K.Shield et al., 2020).The European Union is the region with the highest level of alcohol consumption with seven out of the 10 countries with the highest alcohol consumption being member states (WHO Global Health Observatory, 2019).To reduce alcohol-related harm, different policy options have been proposed, with pricing policies, availability and marketing restrictions usually considered as most effective and cost-effective (Babor et al., 2022;Chisholm et al., 2018).Labels on alcohol products, including health warnings, are one of the policy options endorsed by the World Health Organization (WHO) (WHO European Region, 2022b; WHO's Alcohol Drugs and Addictive Behaviours Unit, 2022) with scarcer evidence base compared to tobacco, where health warnings have been established as an important part of a comprehensive policy strategy (Flor et al., 2021).The majority of the existing studies experimentally examining the impact of labels on different variables have been done in English-speaking countries (Kokole et al., 2021), and despite a few country-specific studies focusing on particular populations in mainland Europe (Annunziata et al., 2019;Glock & Krolak-Schwerdt, 2013;Lacoste-Badie et al., 2022;Morgenstern et al., 2021;Staub et al., 2022), there remains a notable shortage of comprehensive studies on alcohol labelling in this region (Jané-Llopis et al., 2020).
In 2022, the WHO Regional Office for Europe launched the Evidence into Action Alcohol Project (EVID-ACTION) (WHO European Region, 2022c), with one of its key objectives being the development of the evidence base to support the implementation of effective alcohol health warnings.As a first step, an online experiment was designed to provide insight into the impact of different health messages on participants' knowledge of alcohol-related harms, their risk perception and drinking intentions, as well as on the participants' perceptions of the different labels.A research protocol has been developed and uploaded in Open Science Framework (OSF) before the start of the study (WHO European Region, 2022a).
We opted to employ a web-based study to efficiently conduct a randomized experiment on the effect of different alcohol labels across multiple countries, in multiple languages.However, as we also wanted to determine knowledge and risk perceptions on a population level, we needed to ensure a sufficient quota for populations who are usually underrepresented in web-based convenience samples such as the elderly or less-educated groups (Wyatt, 2000).Such convenience sampling methods, although not statistically representative, have data that is comparable to probability survey methods (Barratt et al., 2017).Furthermore, true representative samples are extremely difficult if not impossible to obtain given the high rate of nonresponse and the incomplete sampling frames (Rehm, Kilian, Rovira, et al., 2021).
Thus, this paper describes the procedures employed for data collection and weighting needed to achieve a final sample that can approximate the population characteristics in the 14 participating countries.Its use can result in a efficient approach for multi-country research.

| Aim and development of the survey
With the objective of investigating impact of different health messages on participants' knowledge of alcohol-related harms, their risk perception and drinking intentions, as well as on the participants' perceptions of the different labels, an online survey with an experimental component was conducted via WHO's online platform Data-Form (LimeSurvey) (Schmitz, 2015), covering 14 countries (Austria, Belgium, Estonia, France, Germany, Ireland, Latvia, Lithuania, Netherlands, Norway, Portugal, Slovenia, Spain, and Sweden).The questionnaire was originally designed and developed in English and was translated into 12 languages (Catalan, German, Dutch/Flemish, French, Estonian, Latvian, Lithuanian, Norwegian, Portuguese, Slovenian, Spanish, and Swedish) by country counterparts from national institutions and the translations were double checked by the native speakers in the research team or on the national level.Based on an a priori power analysis of effects and planned statistical analyses for country interactions as described in the protocol (WHO European Region, 2022a), we aimed to reach a minimum sample size of 384 participants per country.Moreover, 1050 participants per country would be necessary to enable country-specific analyses for some of the variables.
The survey was a mixed-effects study, with label conditions measuring a between-subjects effect, and pre-post knowledge as well as other variables measuring within-subjects effects.The experiment consisted of six different randomized label conditions: one control with no message, and five experimental labels with different written messages and/or images displayed on the front label of an alcoholic beverage container.Of the five experimental labels, three included a message specifically linking alcohol consumption to cancer risk.The details regarding labels' attributes can be found in Table A1.
The measures used in the pre-and post-manipulation included knowledge of health conditions related to alcohol including cancer, perceived general health risk, perceived cancer risk due to alcohol 2 of 14 -CORREIA ET AL. consumption, and intention to reduce alcohol consumption.Postmanipulation-only measures included self-reported attention to label and items evaluating respondents' perceptions of different labels.
Demographic information (age, gender and educational attainment level as a measure of socio-economic status), AUDIT-C (Babor et al., 2001) and perceived social norms regarding alcohol consumption were also assessed as covariate and/or potential moderator variables.The order of presenting survey questions was fixed across all questionnaire versions, ensuring consistency among participants.However, to minimize order effects, the order of multiple-choice answers within specific questions was randomized.The complete questionnaire can be found in the Supporting Information S1.
The selection of countries aimed to include the three main drinking cultures present in the European Union (Grant, 2013;Popova et al., 2007), as well as variations within these cultures.
Portugal, France, Slovenia, and Spain follow the Mediterranean style, characterized by regular alcohol consumption, particularly wine with meals, and cultural disapproval of public drunkenness.However, some generational changes have been observed in Spain, shifting from wine to beer preference among younger individuals, and higher acceptance of public drunkenness (Llamosas-Falcón & Rehm, 2023).
The Central and Western European pattern is predominantly associated with beer-drinking countries, where beer consumption is frequent, with or without meals.This pattern is typical of Germany but also observed in Austria, Belgium, the Netherlands, and Ireland.
The Northern European pattern revolves around spirits consumption, characterized by non-daily drinking, sporadic binge drinking episodes, and a higher tolerance for public drunkenness.Norway and Sweden fall into this pattern but have recently undergone a change towards wine drinking.This group also contains the Baltic countries of Estonia, Latvia, and Lithuania, that show historically higher levels of alcohol consumption due to their affiliation with the former Soviet Union (Rehm, Štelemėkas, et al., 2021).
The inclusion criteria for the study were individuals aged 18 years old or older that reported consuming alcohol within the past 12 months, that is, current drinkers.Despite the legal drinking age in some countries such as Germany and Belgium being 16, individuals under 18 years old would require parental consent for participating and thus that was set for defining the exclusion criteria across all countries.The only exception was Lithuania, where the legal age of consent is 20 years old, and so the survey distributed in Lithuania was changed to exclude participants younger than 20 years of age.
Participants were volunteers (i.e., were not compensated for completing the survey) that followed a survey link to an online experiment, which could be completed through their smartphone or tablet/computer.To maintain the full anonymity of participants, no IP addresses were collected or retained, and participants provided basic demographic information (gender, estimated household income, age within 5-year age categories, country, and type of place of residence).

| Recruitment strategies for data collection
The survey respondents were composed of a non-probabilistic convenience sample that was collected in 14 countries in two separate but consecutive survey waves, with different dissemination methods.
Despite not aiming to recruit a truly representative sample, efforts were made to reach a wide range of populational groups by age group, sex, and educational attainment level, as these factors are known to be associated with alcohol consumption (T.F. Babor et al., 2022).
For the first sample wave, national partners were responsible for dissemination in their respective countries.The survey was distributed in each country's native language(s), but language selection was available at the beginning of the online questionnaire.Participants were recruited via multiple online channels, disseminated as widely as possible using a snowball effect through WHO collaborating centres, Ministries of Health in respective countries, and nongovernmental organizations (NGOs) with members in several of the participating countries.It included mailing lists, official organizationbased social media accounts, government websites, and other general survey contact lists.Detailed information on the dissemination channels used for each country, date of dissemination and languages in which the survey was available can be found in Table A2, in the Supporting Information S1.
Preliminary descriptive analyses of the data collected in the first wave indicated that the sample was composed mostly of females and people with tertiary education.As a consequence, some population subgroups (e.g., males with high school education or lower) were only represented by a small number of individuals per country in the sample, which compromised the ability to draw conclusions based on the data at the country level.Given the skewed demographic representation of respondents, interpretations of the perceptions of the different labels in different European countries, one of the primary objectives of the survey, would suffer from this major limitation.
Thus, a second sample wave was conducted on Meta Verse Platforms, namely using paid-for Facebook ads, for the same set of countries as in the first wave.It aimed to complement the first sample wave by targeting people who had lower educational attainment levels (less than tertiary education), with the goal set as 2000 clicks per country over 2 weeks, for a total budget of 14,000 euros (1000 euros per country).To better manage the budget available, all countries started the campaigns with a daily budget of 50 euros, aiming for an average of 143 daily clicks, and further adjustments were made depending on the evolution of cost per click (CPC).Most countries successfully reached the goal with the initial daily budget of 50 euros, but five (Austria, Estonia, the Netherlands, Sweden, and Norway) needed multiple campaigns with budget adjustments to reach the goal.The detailed results per country of paid-for Facebook ads are shown in Table 1.The overall cost per click was 0.28 euros, and the overall completion rate was around 30%, with the lowest rate observed in Lithuania (9.1%) and the highest in Germany (62.8%).Collapsing the completion rate and cost per click yielded a measure of effectiveness defined as cost per completed response, which ranged between 0.33 and 3.01 euros and overall was 0.93 euros/completed response.

| Post-stratification weighting
Given that this project aimed to understand and compare countrylevel differences on various measures, including perceptions of labels with different messages on alcoholic beverages and baseline levels of knowledge for the risk of cancer due to alcohol consumption, we aimed to match the sample to population-level statistics for each country.In doing so, we would reduce the amount of bias due to a country being of a specific demographic, and thus obtain more accurate estimations of descriptive statistics at the country level.In addition, when studying causal effects, it can improve the estimation of regression models' coefficients by correcting for heteroskedasticity or being used to identify average partial effects in the presence of unmodeled heterogeneity of effects (Solon et al., 2015).
For the sample from each country, the number of respondents across our key sociodemographic variables was compared to the actual population distribution from EUROSTAT (Eurostat, 2022).
Standardized post-stratification adjustment weights were computed to approximate the sample to the EUROSTAT population distribution by sex (female or male), age group (18-34, 35-54, 55 or above) and educational attainment level (non-tertiary education or tertiary education), for a sub-sample with complete data on these key sociodemographic characteristics.The choice to group ages into 18-34, 35-54, and 55 or above aligns with EUROSTAT's 5-year intervals and helps distinguish between young, middle-aged, and older adults.This simplifies our data for compatibility and allows us to explore distinct alcohol-related behaviours across these age categories, as they are known to differ in their drinking habits (T.F. Babor et al., 2022).Weights were calculated by country, meaning that the sum of the weights by country equals its sample size.
During the weighting process, high weights, reaching up to 19, were assigned to certain groups to address sample underrepresentation.This assignment, crucial for achieving post-stratification goals, introduced the potential for undue influence from observations exhibiting potential outlier behaviour.To mitigate this, a decision was made to trim weights exceeding 5, and subsequently, the distribution of weights by country was standardized again.While this specific cutoff lacked universal prescription, it represented a pragmatic choice to balance accurate post-stratification weighting with outlier prevention-aligned with practical considerations in survey research (Korn & Graubard, 1999).This chosen cut-off ensured result stability without compromising the representativeness of the sample.Weights exceeding 5 were trimmed for 24 out of the 168 sociodemographic groups used in the weighing procedure.
A significance level of 5% was assumed as well as independence between observations.All analyses were performed using R software version 4.2.1 for windows.The distribution of the final sample by sample wave and sociodemographic characteristics can be found in Table 2, and the evolution of the cumulative number of respondents is plotted in Figure 1.

T A B L E 1
The first sample wave began collection on October 24, 2022 for four countries, and three weeks later, on November 14th, for three to the low completion rate (9.1%).

| Comparison between sample waves
As previously mentioned, the second wave of data collection aimed to complement the first wave by targeting specific populational groups that were under-represented when compared to country statistics, due to the dissemination methods applied in the first wave.2, while comparison of marginal distribution can be found in Figure A1.Moreover, unweighted, and weighted distribution of participants by label condition can be found in Figure A2.

| Coverage of the population by the sample data
CORREIA ET AL.
Adjustments made through weighting did not change the distribution, resulting in balanced sample sized before and after weighting.

| DISCUSSION
This paper provides a comprehensive description of the design of an experimental survey and the dissemination strategies employed to collect a convenience sample across 14 countries.The implementation of the two strategies yielded distinct results in terms of response rates, sample size but complementary sample characteristics.The differences observed underscore the importance of thoughtful selection and execution of dissemination methods, as they directly impact the composition and characteristics of the obtained sample.
The initial wave of data collection revealed the presence of a biased sample compared to the distribution of sex, education and age in the general population of the respective countries, primarily influenced by the distribution channels employed.It was observed that most channels used for this sample wave predominantly attracted participants with higher levels of education, indicating the need to avoid relying solely on these specific channels in future studies.Additionally, the effectiveness of dissemination efforts varied depending on the reach of national counterparts, resulting in notable discrepancies in sample sizes across countries.Notably, despite having a high number of respondents, and relatively high statistical power, large groups of the general population were missing, as highlighted by previous research, underscoring the importance and value of investigating sample distributions, and employing weighting procedures (Kilian et al., 2021).Close monitoring of responses during data collection is essential to identify underrepresented subpopulations and make necessary adjustments.
Targeted sampling methods have the potential to approximate the sample to its population characteristics.Paid-for Facebook ads emerge as a viable approach to recruit survey participants due to their ease of implementation, efficient management by small teams within short timeframes, and relatively low costs compared to traditional survey methods (Kapp et al., 2013;Schneider & Harknett, 2022).However, targets might not be fully achieved, and the efficacy of paid-for Facebook ads in recruiting targeted participants varied across countries.In our study, the average cost per valid response was 0.93 EUR, with variations ranging from 3.01 to 0.33 EUR.Other studies have reported higher costs, such as 2.18 CAD (roughly 1.51 EUR at the time of this study) (Shaver et al., 2019).
These cost discrepancies point out the need to consider countryspecific factors when determining the feasibility and costeffectiveness of recruitment strategies.
The observed heterogeneity between samples highlights the potential bias introduced by self-selected, non-probabilistic surveys.
Such surveys often necessitate the application of statistical adjustments, such as post-stratification weighting, to mitigate biases (Greenacre, 2016;Mäkelä, 2021;Rehm, Kilian, & Manthey, 2021;Rehm, Kilian, Rovira, et al., 2021;Wright, 2005), as implemented in our study.However, it resulted in relatively high weights for some particular sub-populations, due to the inherent heterogeneity of the sample.Hence, weight trimming was required, resulting in small deviations when comparing the cross-sectional sample distributions of key socioeconomic characteristics with population data, at the country level.Moreover, multi-country surveys can result in massive samples, which might compromise the most traditional statistical analysis as they are overpowered for testing many hypotheses (Case & Ambrosius, 2007).It is important to exercise caution and consider the practical significance and effect size in addition to statistical significance when interpreting results from large samples (Sullivan & Feinn, 2012).
The study has some limitations: first, we cannot claim a representative survey, as such a characteristic depends on probabilistic sampling strategies, and inclusive sampling frame and no response biases present drinking (Kruskal & Mosteller, 1979).Such representative surveys do not seem to be possible in alcohol studies in the EU.
First, most probabilistic sampling frames based on households exclude key groups of heavy alcohol consumers, that is the homeless and some institutionalised populations (K.D. Shield & Rehm, 2012).
Second, high non-response bias usually leads to the underestimation of alcohol use, as evidence in the discrepancy between level of consumption estimated via surveys and via sales in a country (Rehm et al., 2007).Thus, the presented solution is an attempt to produce the best possible results with limited resources given the current possibilities (Rehm, Kilian, Rovira, et al., 2021).Finally, it needs to be mentioned that for the hypothesis testing of the experimental part, representativeness is not necessary (Rothman et al., 2013).Another limitation is that our sample is based on drinkers and was weighted against the general population.While we cannot exclude a bias here, as abstainers may have different sociodemographic characteristics than alcohol consumers, such a bias will be small, given the only a relatively small minority in the EU abstains from alcohol (WHO European Region, 2019). In All participants provided informed consent before participating in the study and exemption on the EU level was provided by the Data Protection Office in Departament de Salut, Generalitat de Catalunya (Barcelona, Spain, DPD #21/2022).Ethical clearance was granted by the WHO Collaborating Centre for Addiction and Mental Health in Toronto, Canada (Centre for Addiction and Mental Health, Research Ethics Board, #095/2022).
others.The data collection for the remaining countries started between the fourth of January and the sixth of February 2023, and increases in the number of respondents were observed until the end of March 2023.Most countries surpassed the 300 participants in the first sample wave, except for Lithuania (n = 291), Slovenia (n = 277) and the Netherlands (n = 42).The second sample wave was collected for all countries simultaneously, during a 2-week period between the 10th and May 24, 2023.It represents 58.8% of the total sample size, comprising between 40% and 70% of the final sample in most countries.The exceptions were Portugal, for which it represents only a third of the sample size (33.2%), and the Netherlands, where the paid-for Facebook ads sample represents the great majority of the data collected (94.5%).As for absolute values, the second sample wave collected more than 400 participants for most countries, only with the exception of Lithuania (n = 238), due

Budget spent (EUR) Number of clicks Overall CPC (EUR) Completion rate Cost per completed response (EUR)
Dissemination results of paid-for Facebook ad, by country.By the end of the data collection period (May 24, 2023) the survey link had been visited by more than 34,000 participants.Of those, around two-thirds (63.2%, n = 20,183) provided informed consent, met the minimum age and drinking status criteria for participation, and reached the end of the survey.The attrition rate was lower in the first sample wave as compared to the second sample wave Abbreviation: CPC, cost per click.

Table 2
sample representativeness at the country level, the high percentage of people with tertiary education in the sample implied a bias in responses and a lack of population groups with lower socioeconomic status, which would compromise further statistical analyses and the generalizability of the study conclusions.The majority of the sample is also composed of women (65.7%), which was observed for all

Table 3
(Eurostat, 2022)ple size of each country according to the three key sociodemographic characteristics (i.e., sex, age and educational attainment level), and compares unweighted and weighted sample distribution with the country's respective distributions in 2022, obtained from EUROSTAT(Eurostat, 2022).The number of observations in each group is higher than 100 for all countries, falling behind only for adults aged 55 or older in Estonia (n = 46) and Slovenia (n = 48), and middle-aged adults (35-54 years old) in the Netherlands (n = 87).Before weighting, and as compared to the EUROSTAT characteristics, survey samples had a higher proportion of adults under 35 years old and a lower proportion of the remaining age groups for all countries except for Sweden.Unweighted education marginal distributions were successfully approximated in Austria, Estonia, and Germany.As for sex, the unweighted samples were never successful approximations of EURO-Latvia (weighted sample: 26.4%, 95%CI = 22.4-30.9%;EUROSTAT: 35.4%).The comparison of the key socioeconomic characteristic's joint distribution between the weighted sample and country is represented in Figure Survey sample characteristics by the wave of data collection.
T A B L E 2 a n (%).
Cumulative number of valid submissions throughout data collection, by country of residence and sample wave, between October 2022 and May 2023.Sample distribution of socioeconomic characteristics by country of residence and comparison with country distribution according to EUROSTAT.
F I G U R E 1CORREIA ET AL.